The extragradient method has recently gained increasing attention, due to its convergence behavior on smooth games. In $n$-player differentiable games, the eigenvalues of the Jacobian of the vector field are distributed on the complex plane, exhibiting more convoluted dynamics compared to classical (i.e., single player) minimization. In this work, we take a polynomial-based analysis of the extragradient with momentum for optimizing games with \emph{cross-shaped} Jacobian spectrum on the complex plane. We show two results. First, based on the hyperparameter setup, the extragradient with momentum exhibits three different modes of convergence: when the eigenvalues are distributed $i)$ on the real line, $ii)$ both on the real line along with complex conjugates, and $iii)$ only as complex conjugates. Then, we focus on the case $ii)$, i.e., when the eigenvalues of the Jacobian have \emph{cross-shaped} structure, as observed in training generative adversarial networks. For this problem class, we derive the optimal hyperparameters of the momentum extragradient method, and show that it achieves an accelerated convergence rate.
translated by 谷歌翻译
The ability to dynamically adapt neural networks to newly-available data without performance deterioration would revolutionize deep learning applications. Streaming learning (i.e., learning from one data example at a time) has the potential to enable such real-time adaptation, but current approaches i) freeze a majority of network parameters during streaming and ii) are dependent upon offline, base initialization procedures over large subsets of data, which damages performance and limits applicability. To mitigate these shortcomings, we propose Cold Start Streaming Learning (CSSL), a simple, end-to-end approach for streaming learning with deep networks that uses a combination of replay and data augmentation to avoid catastrophic forgetting. Because CSSL updates all model parameters during streaming, the algorithm is capable of beginning streaming from a random initialization, making base initialization optional. Going further, the algorithm's simplicity allows theoretical convergence guarantees to be derived using analysis of the Neural Tangent Random Feature (NTRF). In experiments, we find that CSSL outperforms existing baselines for streaming learning in experiments on CIFAR100, ImageNet, and Core50 datasets. Additionally, we propose a novel multi-task streaming learning setting and show that CSSL performs favorably in this domain. Put simply, CSSL performs well and demonstrates that the complicated, multi-step training pipelines adopted by most streaming methodologies can be replaced with a simple, end-to-end learning approach without sacrificing performance.
translated by 谷歌翻译
基于Centroid的聚类方法,例如K-均值,K-Medoids和K-Centers在探索性数据分析中被大量应用作为首选工具。在许多情况下,这些方法用于获得数据歧管的代表性质心,以可视化或摘要数据集。现实世界的数据集通常包含固有的异常情况,例如重复样本和采样偏见,表现出不平衡的聚类。我们建议通过对质心形成的群集引入最大半径约束$ r $来纠正这种情况,即,从同一集群中的样本则不应以$ \ ell_2 $距离的价格分开超过$ 2R $。我们通过求解半明确程序来实现此约束,然后是二次约束的线性分配问题。通过定性结果,我们表明我们提出的方法对数据集的不平衡和采样伪像是可靠的。据我们所知,我们的是第一个受到严格半径约束的约束K-均值聚类方法。 https://bit.ly/kmeans限制的代码
translated by 谷歌翻译
我们为神经网络提出了一种新颖,结构化修剪算法 - 迭代,稀疏结构修剪算法,称为I-Spasp。从稀疏信号恢复的思想启发,I-Spasp通过迭代地识别网络内的较大的重要参数组(例如,滤波器或神经元),这些参数组大多数对修剪和密集网络输出之间的残差贡献,然后基于这些组阈值以较小的预定定义修剪比率。对于具有Relu激活的双层和多层网络架构,我们展示了通过多项式修剪修剪诱导的错误,该衰减是基于密集网络隐藏表示的稀疏性任意大的。在我们的实验中,I-Spasp在各种数据集(即MNIST和ImageNet)和架构(即馈送前向网络,Resnet34和MobileNetv2)中进行评估,其中显示用于发现高性能的子网和改进经过几种数量级的可提供基线方法的修剪效率。简而言之,I-Spasp很容易通过自动分化实现,实现强大的经验结果,具有理论收敛保证,并且是高效的,因此将自己区分开作为少数几个计算有效,实用,实用,实用,实用,实用,实用,实用,实用和可提供的修剪算法之一。
translated by 谷歌翻译
鉴于密集的浅色神经网络,我们专注于迭代创建,培训和组合随机选择的子网(代理函数),以训练完整模型。通过仔细分析$ i)$ Subnetworks的神经切线内核,II美元)$代理职能'梯度,以及$ iii)$我们如何对替代品函数进行采样并结合训练错误的线性收敛速度 - 内部一个错误区域 - 对于带有回归任务的Relu激活的过度参数化单隐藏层Perceptron。我们的结果意味着,对于固定的神经元选择概率,当我们增加代理模型的数量时,误差项会减少,并且随着我们增加每个所选子网的本地训练步骤的数量而增加。考虑的框架概括并提供了关于辍学培训,多样化辍学培训以及独立的子网培训的新见解;对于每种情况,我们提供相应的收敛结果,作为我们主要定理的冠状动脉。
translated by 谷歌翻译
随机梯度下降血液(SGDM)是许多优化方案中的主要算法,包括凸优化实例和非凸神经网络训练。然而,在随机设置中,动量会干扰梯度噪声,通常导致特定的台阶尺寸和动量选择,以便保证收敛,留出加速。另一方面,近端点方法由于其数值稳定性和针对不完美调谐的弹性而产生了很多关注。他们随机加速的变体虽然已接受有限的注意:动量与(随机)近端点的稳定性相互作用仍然在很大程度上是不孤立的。为了解决这个问题,我们专注于随机近端点算法的动量(SPPAM)的收敛性和稳定性,并显示SPPAM与随机近端点算法(SPPA)相比具有更好的收缩因子的更快的线性收敛速度,如适当的HyperParameter调整。在稳定性方面,我们表明SPPAM取决于问题常数比SGDM更有利,允许更广泛的步长和导致收敛的动量。
translated by 谷歌翻译
神经网络修剪对于在预训练的密集网络架构中发现有效,高性能的子网有用。然而,更常见的是,它涉及三步过程 - 预先训练,修剪和重新训练 - 这是计算昂贵的,因为必须完全预先训练的密集模型。幸运的是,已经经过了多种作品,证明可以通过修剪发现高性能的子网,而无需完全预先训练密集网络。旨在理论上分析修剪网络表现良好的密集网络预培训量,我们发现在两层全连接网络上的SGD预训练迭代数量中发现了一个理论界限,超出了由此进行修剪贪婪的前瞻性选择产生了一个达到良好训练错误的子网。该阈值显示在对数上依赖于数据集的大小,这意味着具有较大数据集的实验需要更好地训练通过修剪以执行良好执行的子网。我们经验展示了我们在各种架构和数据集中的理论结果的有效性,包括在Mnist上培训的全连接网络以及在CIFAR10和ImageNet上培训的几个深度卷积神经网络(CNN)架构。
translated by 谷歌翻译
Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.
translated by 谷歌翻译
Being able to forecast the popularity of new garment designs is very important in an industry as fast paced as fashion, both in terms of profitability and reducing the problem of unsold inventory. Here, we attempt to address this task in order to provide informative forecasts to fashion designers within a virtual reality designer application that will allow them to fine tune their creations based on current consumer preferences within an interactive and immersive environment. To achieve this we have to deal with the following central challenges: (1) the proposed method should not hinder the creative process and thus it has to rely only on the garment's visual characteristics, (2) the new garment lacks historical data from which to extrapolate their future popularity and (3) fashion trends in general are highly dynamical. To this end, we develop a computer vision pipeline fine tuned on fashion imagery in order to extract relevant visual features along with the category and attributes of the garment. We propose a hierarchical label sharing (HLS) pipeline for automatically capturing hierarchical relations among fashion categories and attributes. Moreover, we propose MuQAR, a Multimodal Quasi-AutoRegressive neural network that forecasts the popularity of new garments by combining their visual features and categorical features while an autoregressive neural network is modelling the popularity time series of the garment's category and attributes. Both the proposed HLS and MuQAR prove capable of surpassing the current state-of-the-art in key benchmark datasets, DeepFashion for image classification and VISUELLE for new garment sales forecasting.
translated by 谷歌翻译
Although many machine learning methods, especially from the field of deep learning, have been instrumental in addressing challenges within robotic applications, we cannot take full advantage of such methods before these can provide performance and safety guarantees. The lack of trust that impedes the use of these methods mainly stems from a lack of human understanding of what exactly machine learning models have learned, and how robust their behaviour is. This is the problem the field of explainable artificial intelligence aims to solve. Based on insights from the social sciences, we know that humans prefer contrastive explanations, i.e.\ explanations answering the hypothetical question "what if?". In this paper, we show that linear model trees are capable of producing answers to such questions, so-called counterfactual explanations, for robotic systems, including in the case of multiple, continuous inputs and outputs. We demonstrate the use of this method to produce counterfactual explanations for two robotic applications. Additionally, we explore the issue of infeasibility, which is of particular interest in systems governed by the laws of physics.
translated by 谷歌翻译